Bayesian inference on quasi-sparse count data
نویسندگان
چکیده
There is growing interest in analysing high-dimensional count data, which often exhibit quasi-sparsity corresponding to an overabundance of zeros and small nonzero counts. Existing methods for analysing multivariate count data via Poisson or negative binomial log-linear hierarchical models with zero-inflation cannot flexibly adapt to quasi-sparse settings. We develop a new class of continuous local-global shrinkage priors tailored to quasi-sparse counts. Theoretical properties are assessed, including flexible posterior concentration and stronger control of false discoveries in multiple testing. Simulation studies demonstrate excellent small-sample properties relative to competing methods. We use the method to detect rare mutational hotspots in exome sequencing data and to identify North American cities most impacted by terrorism.
منابع مشابه
Bayesian functional principal components analysis for binary and count data
Recently, van der Linde (2008) proposed a variational algorithm to obtain approximate Bayesian inference in functional principal components analysis (FPCA), where the functions were observed with Gaussian noise. Generalized FPCA under different noise models with sparse longitudinal data was developed by Hall, Müller and Yao (2008), but no Bayesian approach is available yet. It is demonstrated t...
متن کاملEstimation of confidence intervals for Multinomial proportions of sparse contingency tables using Bayesian methods
Multinomial distribution, widely used in applications with discrete data, witnessed varieties of competing intervals from frequentist to Bayesian methods, still prove to be interesting in the case of zero counts or sparse contingency tables. The methods commonly recommended in both approaches are considered based on its influence of zero counts, polarizing cell counts, and aberrations. The infe...
متن کاملInference in generalized additive mixed models by using smoothing splines
Generalized additive mixed models are proposed for overdispersed and correlated data, which arise frequently in studies involving clustered, hierarchical and spatial designs. This class of models allows ̄exible functional dependence of an outcome variable on covariates by using nonparametric regression, while accounting for correlation between observations by using random effects. We estimate no...
متن کاملBayesian Inference for Spatial Beta Generalized Linear Mixed Models
In some applications, the response variable assumes values in the unit interval. The standard linear regression model is not appropriate for modelling this type of data because the normality assumption is not met. Alternatively, the beta regression model has been introduced to analyze such observations. A beta distribution represents a flexible density family on (0, 1) interval that covers symm...
متن کاملAsynchronous Distributed Estimation of Topic Models for Document Analysis
Given the prevalence of large data sets and the availability of inexpensive parallel computing hardware, there is significant motivation to explore distributed implementations of statistical learning algorithms. In this paper, we present a distributed learning framework for Latent Dirichlet Allocation (LDA), a well-known Bayesian latent variable model for sparse matrices of count data. In the p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 103 شماره
صفحات -
تاریخ انتشار 2016